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INTRODUCTION 

The  use  of  a  computer  simulation  of  the  global  atmospheric  circu¬ 
lation  is  the  most  promising  way  of  answering  the  climatological  ques¬ 
tions  posed  by  the  Climatic  Impact  Assessment  Program.  Among  the  many 
problems  inherent  in  the  use  of  such  models,  we  have  chosen  to  investi¬ 
gate  the  foilwing  three: 

•  How  sensitive  is  the  model  to  changes  in  boundary 
conditions?  That  is,  can  an  observable  change  in  a 
predicted  atmosphere  be  causally  related  to  the  boun¬ 
dary  condition  change  or  is  it  just  an  acceptable 
random  state  resulting  from  a  slight  perturbation  or 
uncertainty  in  initial  conditions? 

•  Models  claiming  to  predict  the  same  physical  phenomena 
should  agree  if  they  begin  with  identical  boundary  and 
initial  conditions.  What  is  meant  by  "agreement"  in 
the  presence  of  small  differences  in  initial  conditions 
and  do,  in  fact,  existing  models  agree? 

•  A  model,  ideally,  should  predict  the  real  atmosphere. 

Does  this  happen?  Or  alternatively,  what  actual  com¬ 
parisons  are  possible  between  a  model  and  a  very  large 
file  of  real  climato] igical  data? 


The  eventual  purpose  of  this  study  is  to  to.  dilate  statistical 
mode It ,  testing  procedures  and  data  retrieval  programs  which  will  help 
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answer  these  questions.  This  paper  will  describe  some  of  cuf  past 
and  present  approaches  and  experience  with  the  first  of  the  three 
tasks,  model  sensitivity,  then  briefly  describe  some  future  plans  for 
tasks  two  and  three.  It  is,  in  regards  to  present  work,  too  soon  for 
conclusive  results.  For  this  reason,  we  choose  to  maintain  an  informal, 
discursive  tone  in  this  paper,  using  a  minimum  of  formalism,  saving 
that  for  the  time  when  definitive  results  are  available. 

THE  ORlC'TNAL  EXPERIMENT 

In  all  the  disc,  ssion  to  follow  the  word  "model"  refers  to  the 
Mintz-Arakawa  2-level  general  circulation  model  (Gates,  et  al.,  1971; 
Arakawa,  1972).  For  sine  time,  the  almost  canonical  question  (or 
scapegoat  problem)  was  "what  would  be  the  effect  on  the  climate  if  all 
the  Arctic  sea  ice  were  replaced  by  water  at  -1°C  (water  temperatures 
in  the  It— A  model  are  not  permitted  to  vary).  We  entered  the  arena 
early  in  1972  (Warshaw  and  Rapp,  1972)  with  the  following  experiment: 

The  initial  conditions  were  chosen  to  simulate  the  state  of  the 
atmosphere  for  December  31  and  the  model  run  to  simulate  the  next 
60  days.  This  run  is  called  the  "control"  run.  Next,  the  initial 
global  temperature  field  at  both  the  a  *  .25  and  a  *  . 75  levels  were 
perturbed  with  an  additive  random  temperature.  These  random  "noise" 
temperatures  were  drawn  from  a  normal  distribution  having  zero  mean  and 
a  1°C  standard  deviation.  The  model  was  re-run  for  the  same  60  days 
as  the  control  run.  Following  this,  the  original  temperature  field  was 
again  perturbed  by  new  random  additive  noise  drawn  from  the  same  dis¬ 
tribution  and  the  model  run  again  for  the  same  6 0  days.  We  now  have 
3  runs,  identical  in  all  respects  except  for  very  small  changes  in  the 
initial  temperature  field. 

The  experiment  is  completed  by  re-running  each  of  the  above  60-day 
simulations  except  the  Arctic  sea  ice  ie  removed  and  replaced  with  sea 
water.  The  initial  temperature  fields  are,  however,  identical  to  runs  1-3. 
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Using  these  6  runs,  we  formulate  an  analysis  of  variance  model 
and  proceed  to  test  the  hypothesis  that  removing  the  Arctic  ice  had 
no  effect  on  the  climatological  variables  of  interest.  The  2*3 
layout  is  illustrated  in  Table  1,  where  Y^  for  the  original  experiment 
was  the  zonal  average  of  the  variable,  averaged  also  over  the  last 
30  days  of  the  run.  The  procedure  has  been  extended  to  a  multivariant 
test,  so  that  the  Y^  may  now  be  vector  valued,  but  Y^  was  only  uni¬ 
variate  in  the  reported  work. 


Table  1 

ARRAY  OF  SCALAR  VARIABLES 


Unperturbed 

Perturbation  No.  1 

Perturbation  No.  2 

Ice  In 

Y11 

Y12 

Y  Y 

13  1. 

Ice  Out 

Y21 

Y22 

Y23  Y2. 

Y.l 

Y.  2 

Y,3 

We  may  interpret  differences  in  variables  in  the  same  voq  as  "noise" 

in  the  experiment  and  differences  in  variables  in  the  same  column  as 

useful  "signal."  An  analysis  of  variance  procedure  provides  a  sharp 

procedure  for  testing  whether  che  row  means,  Y  ,  are  significantly 

different  by  eliminating  the  effects  of  differences  in  the  column  means, 

Y  .  Table  2  reproduces  the  results  given  in  Warshaw  and  Rapp  (1972)  , 

•  J 

and  we  see  that  many  changes  in  temperature,  geopotential  height,  zonal 
wind  and  heat  transport  may  be  statistically  attributed  to  the  removal 
of  the  Arctic  sea  ice. 

Some  important  points  remain  to  be  made  relative  to  this  original 
experiment.  First,  it's  readily  apparent  that  tlie  economics  of  this 


data  and  statistics  for  TEMPERATURE,  CEOPOTENT 1 AL  HE1CHT,  ZONAL  WIND 
POLAR  HEAT  TRANSPORT.  AND  POLAR  MOMENTUM  TRANSPORT 
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procedure  are  totally  unacceptable.  The  cost  of  simulating  a  single 
day  bv  use  of  the  Mintz-Arakawa  model  is  approximately  $230,*  thus 
the  cost  of  replicating  each  run  at  least  three  times  in  order  to 
provide  adequate  sampling  is  hardly  the  way  of  the  future.  An  inter¬ 
mediate  approach  might  be  to  run  n(>  3)  replications  of  the  control 
case  each  with  some  initial  condition  perturbation,  but  make  only  one 
run  of  the  experiment  which  embodies  the  change  in  boundary  conditions. 
For  scale;  variates,  this  is  the  standard  t-test  giving  less  sharp 
results  than  the  F-test  while  only  reducing  the  computing  by  30  per¬ 
cent.  Ideally  we'd  like  to  make  only  one  control  run  and  one  experi¬ 
ments'  run.  Below  is  an  example  of  how  to  get  into  trouble  with  this 
approach,  followed  by  a  discussion  of  methods  under  study  to  extract 
more  needed  information  from  the  sample  provided  by  only  one  control 
and  one  experimental  run. 

THE  CASE  OF  THE  ALEl I I AM  LOW 

Figure  1  shows  the  location  (latitude  and  longitude)  and  pressure 
of  a  local  minimum  in  the  North  Pacific  called  the  Aleutian  low.  There 
are  six  data  points,  3  for  the  ice-in  simulations  and  3  for  ice-out. 
Originally  it  was  claimed  that  the  removal  of  the  Arctic  ice  caused  a 
significant  lowering  of  the  pressure  and  westward  shift  of  the  low 
pressure  center,  as  inferred  from  the  two  nonperturbed  data  points 
(Fletcher,  1971).  Upon  runn*  ig  the  simulation  four  more  times  with  the 
already  described  perturbations,  it  became  clear  (and  you  don't  need  a 
statistician  to  see  it)  that  the  shift  and  lowering  of  the  pressure  was 
well  within  the  random  variation  of  the  triple  of  numbers  (latitude, 
longitude,  pressure). 

This  is  exactly  the  sort  of  problem  which  will  have  to  be  faced 
over  and  over  again  when  searching  simulation  results  for  significant 
change  in  the  presence  of  the  very  high  cost  of  additional  samples. 


t 


Using  the  IBM  360/91 


computer  at  UCLA. 


6' 
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Fig.  1 — Location  and  pressure  of  the  Aleutian  low.  30-day  mean,  Arctic  ice-in  vs  ice-out  (3  samples  of  each). 
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FRTSLNT  VCRK  IN  HYFOTHFSIS  TESTING  USING  TWO  SIMULATION  RUNS 

The  central  problem  in  assessing  the  sensitivity  of  a  climatic 
simulation  is  simply  how  to  get  enough  independent  samples  of  both 
the  control  and  the  experiment  to  permit  some  reasonably  sharp  form 
oi  hypothesis  testing.  Instead  of  making  independent  runs  as  described 
above  we  are  trying  to  extract  equivalent  information  from  the  temporal 
sequence  of  data  generated  by  a  single  run. 

Remembering  that  this  is  not  a  report  on  final  results,  but  only 
an  interim  statement,  we'll  outline  procedures  which  are  now  under¬ 
going  tests  with  both  simulated  and  real  data. 

Assume  the  model  produces  a  sequence  of  data,  i.e.,  a  variable  of 
meteorological  interest,  possibly  vector  valued,  X,,  X2>  ...  X  .  For 
example,  might  be  the  array  representing  the  800  mb  temperatures  at 
every  grid  point  over  North  America,  and  the  subscript  indexes  time. 

For  the  Mintz-Arakawa  model,  Xj  and  X  are  separated  by  6  hours. 

One  procedure  is  to  take  all  the  available  data  (if  the  simulation  were 
60  days  long  we  d  have  Xj,  X2 ,  ...,  X^)  and  fin(j  the  smallest  k  such 

that  the  subsequence  X^  X1+R,  X1+2R . X1+nR  (1  +  nk  <  240)  has  the 

following  properties: 

•  The  subsequence  has  sufficiently  small  estimated  first 
order  correlation  coefficient,  r^.  We  expect  the  Durbin- 
Watson  test  for  absence  of  first  order  correlation  to  pro¬ 
vide  the  appropriate  testing  procedure.  That  is,  if  we 
accept  the  null  hypothesis  from  the  Durbin-Watson  test, 
we'll  claim  the  subsequence  is  uncorrelated. 

*  The  subsequence  has  no  significant  cyclical  component. 

The  turning  point  test  (Kendall,  1963)  provides  a  con¬ 
venient  test  of  this  property.  The  turning  point  test 
counts  the  number  of  "peaks"  and  "troughs"  in  the  time 
series  and  tests  the  hypothesis  that  this  number  came  from 
a  distribution  of  independent  random  variables. 
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•  Last,  the  sequence  should  have  no  significant  linear 

trend.  To  test  for  this  property,  we  apply  the  Kendall 

rank  correlation  test  (Kendall,  1963).  This  test  takes 

the  sequence  X,,  X„ ,  ...  X  ,  counts  the  number  of  pairs 
l  L  n 

in  which  >  X^,  j  >  i.  This  number,  N,  is  compared 
with  the  expected  number  in  a  random  series  and  the 
excess  (or  deficiency)  over  the  expected  number  indicates 
a  tendency  toward  a  positive  (or  negative)  trend. 


If  we  fail  to  reject  the  null  hypothesis  of  all  three  tests, 
we  have  an  adequate  demonstration  that  the  random  variables  in  the 
subsequence  X^,  X^^*  ...  are  uncorrelated  and  we'll  treat  them  as 
independent  samples.  Assuring  that  there  exists  a  sufficiently  small 
k  (or,  equivalently,  a  sufficiently  large  sample)  which  still  satisfies 
the  above  requirements  for  both  the  control  and  the  experiment  runs, 
we  f-oceed  as  follows: 


•  Using  the  independent  samples  from  each  run,  estimate 
the  covariance  matrix,  G,  for  the  vector  variables,  X^ . 

That  is,  we  estimate  the  covariance  between  the  same 
meteorological  variables  at  different  spatial  locations 
(say,  for  instance,  over  the  grid  points  covering  North 
America) . 

•  We  ask  that  the  sample  size  be  large  enough  to  permit 
the  inversion  of  the  covariance  matrix.  Too  few  samples, 
while  permitting  a  rough  estimation  of  G,  leave  G  singu¬ 
lar  and  we  require  G  ^  to  formulate  a  test  statistic. 

The  investigator,  and  let's  assume  he's  a  meteorologist,  is 
now  faced  with  the  first  of  several  interesting  questions.  They  con¬ 
cern  the  assumptions  he's  willing  to  make  about  the  behavior  of  the 
real  atmosphere  in  order  to  achieve  a  sharper  test  in  discriminating 
betweer  tne  control  and  experimental  run.  For  instance,  should  he 


assume ,  a  priori,  that  nature  would  generate  identical  covariance 
matrices  for  identical  regions  independent  of  whether  the  control  or 
experimental  condition  is  being  observed?  If  the  answer  is  yes,  we 
may  proceed  directly  to  the  final  hypothesis  testing  stage.  If  the 
answer  is  no,  the  investigator  may  call  for  an  equality  of  covariance 
test  (Anderson,  1958)  and  if  the  test  rejects  the  hypothesis  of 
equality,  we  have  the  classical  Behrens-Fisher  problem  (that  of  test¬ 
ing  for  the  equality  of  me  ns  when  the  variances  are  unknown  and 
unequal)  for  which  the  hypothesis  tests  are  just  not  as  good  as  the 
case  of  equal  covariances.  Remember  here  that  a  priori  knowledge 
is  better  than  blindly  testing  for  equality,  because  there  is  always 
the  non-zero  probability  of  rejecting  the  hypothesis  of  equality 

hen,  in  fact,  it  is  true.  Moreover,  the  equality  of  covariance  test 
is  not  very  good  for  large  covariance  matrices. 

In  any  event,  we're  finally  prepared  to  use  Hotelling's  T2-test 
(Anderson,  1958)  to  test  the  hypothesis  that  the  control  and  experi¬ 
mental  runs  have  the  same  mean  value,  where,  again,  we  mean  a  vector 
of  mean  values  of  the  geographical  area  of  interest. 

FILTERING  THE  DATA  AND  COVARIANCE  MATRIX 

2 

In  order  to  get  to  the  T  -test  for  equality  of  two  mean 
vectors,  the  original  data  had  to  meet  quite  a  few  criteria.  Highly 
correlated  data  can  only  do  this  at  the  cost  of  decreasing  sample 
size.  In  other  words,  the  more  we  separate  the  samples  in  time  try¬ 
ing  to  achieve  independence,  the  fewer  samples  we'll  have  left. 

There  are  two  procedures  which  promise  to  save  the  really  difficult 
cases : 

If  we  have  any  reason  to  believe  that  the  data  {x  }  are  generated 
by  a  process  which  is  primarily  a  Markov  process,  so  that  successive 
values  of  X  are  represented  by  the  equation 
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"here  U  is  a  random  variable  and  ^  is  the  first  order  correlation 
coefficient,  then  the  transformation  of  { X± }  to  the  uncorrelated  sequence 
is  given  by  the  relationship 


Xi  "  PlXi-l 


(2) 


where  P]_  is  the  first  ordev  correlation  coefficient  estimated  from  the 
sample  {X^. 

Now,  the  closer  the  sequence  { X± }  comes  tc  behaving  as  if  it  were 
generated  by  Eq.  (1),  the  closer  fY£}  comes  to  being  an  uncorrelated 
sequence.  All  good  and  well  except  when  we  examine  what  effect  the 
transformation  has  had  on  the  moments.  Assume  now  that  we  actually 
have  two  vector  sequence  {X^  and  {Z.}  represented  in  the  sample  data 
from  the  control  and  experimental  runs.  To  demonstrate  this  argument, 
assume  both  X£  and  Zj  have  identical  mean  value,  y,  and  variance,  o2. 
Hence  we  would  expect  to  accept  an  hypothesis  made  about  the  equality 

of  means,  since  by  definition,  they're  equal.  Now,  transform  (X  }  and 
{Z  )  according  to  i 


X.  - 


P  X 

x  i-1 


W. 


z ,  - 


pVi-i 


(3) 


(4) 


where  ox  and  Py  are  the  estimated  first  order  correlation  coefficients 
and  themselves  random  variables.  The  sequences  {Y  >  and  {W  }  have  the 
following  moments 


E(Yi]  =  (1  ‘  PX)W 
ElW^  =  (1  -  £y)u 
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var(Y  . )  =  (1  -  p  )o2  (7) 

a2  2 

var(W^)  =  (1  -  p  )o  (8) 


Notice  that  the  transformation  has  the  favorable  property  of 

reducing  the  variance  of  each  sample,  but  unfortunately,  since 

and  py  are  random  variables,  there  is  no  reason  why  E[Y^)  and  E[W^) 

should  be  equal.  Ir  practice,  for  very  large  p  and  p  ,  the  trans- 

x  y 

formation  introduces  such  a  large  bias  in  the  means  that,  even  with 

the  variance  reducing  property,  the  transformation  is  not  of  much  value. 

However,  if  by  testing  for  the  equality  of  p  and  p  (say,  using  the 

x  y 

Fisher  Z-transformation ;  Hoel,  1956)  and  finding  them  equal  (or  by 

arguing  from  the  physics  that  one  would  expect  them  to  be  equal  —  a 

risky  proposal)  we  could  define  a  mean  correlation  coefficient, 

P  *  ^(p  +  p  ),  the  Markov  process  transformation  could  be  of  great 

x  y 

value  since  it  introduces  no  extra  uncertainty  into  the  problem. 

Note  that  even  with  all  these  ancillary  tests  and  assumptions, 
the  Markov  process  transformation  doesn't  make  for  sharper  discrimina¬ 
tion  in  the  final  testing  of  mean  values,  since  the  new  means  are  closer 
together  in  a  way  that  is  not  compensated  for  by  the  reduction  of  vari¬ 
ance.  What  we  might  get,  however,  is  a  sufficiently  large  uncorrelated 
sample  so  at  least  some  hypothesis  testing  about  the  equality  of  means 
can  be  done. 

Another,  even  more  ad  hoc,  procedure  might  be  adcpted  if  the  sample 
size  is  so  small  that  the  covariance  matrix  is  singular.  If  G  is  an 
n  x  n  covariance  matrix,  then  G  is  non-singular  if  T  £  n  +  1  where  T 
is  the  sample  size.  In  cases  where  this  inequality  is  not  satisfied. 


we  might  try  formulating  the  matrix  of  correlation  coefficient,  R, 

*  * 

associated  with  G.  Then,  whenever  r.  <  r  set  g  ,  =  0,  where  r  is 

*7  a 

some  small  number.  In  other  words,  choose  an  r  ,  say  r  =0.1.  Then 

•k 

if  the  estimated  correlation  coefficient  is  less  than  r  we  arbitrarily 
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set  thr  corresponding  covariance  g.^  =  0.  This  tends  to  eiiminate 
small  values  of  which  contribute  very  little  to  the  stficturc 

°f  thf  Problen'»  but  do  influence  the  singularity  of  G.  If  this  is 
at  all  acceptable,  the  trick  will  be  to  find  an  approximation  to  the 
smallest  r  which  makes  G  non -singular .  Trial  and  error  might  suffice. 

MODEL  COMPARISONS 

There  is  also  the  problem  of  comparing  two  models  or  comparing 
one  model  to  the  real  atmosphere.  Not  nearly  as  much  progress  has 
been  made  here.  A  trial  comparison  has  been  made  between  the  3-D 
Mint/.-Arakawa  model  and  the  zonal ly  averaged  model  (ZAM)  of 
MacCracken.  This  was  done  'o  test  algrrithms  which  produce  com¬ 
patible  initial  conditions  and  the  actual  comparison  of  predicted 
results  wasn't  completed. 

The  comparison  is  done  by  computing  the  zonal  averages  of  M-A 
results  after  the  run  has  been  made  and  then  comparing  these  zonal 
averages  with  those  produced  directly  by  ZAM.  Work  will  accelerate 
on  the  comparison  task,  but  we  foresee  the  problem  of  what  constitutes 
model  agreement  and  the  requisite  phvsical  explanation  should  they 
fail  to  agree. 

The  availability  of  real  climatological  data  should  help  here. 

We  are  preparing  a  5-vear  data  base  using  NCAR  raw  data.  This  con¬ 
sists  of  temperature,  geopotential  height,  wind  and  relative  humidity 
every  12  hours.  We  stress  the  importance  of  having  the  sequential 
data  at  least  daily  and  of  insuring  that  these  data  are  readily  acces¬ 
sible  to  the  investigator  and  are  Implementing  a  data  retrieval  system 
which  permits  easy  comparison  between  the  real  climate  and  the  M-A 
results.  The  same  should  bo  true  of  ZAM  data  if  we  make  minor  varia¬ 
tions  in  our  retrieval  program,  so  there’s  lots  that  can  and  will  be 
done  in  the  realm  of  three-way  comparisons. 
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